Abstract
DNA methylation has been functionally implicated in X-inactivation, genomic imprinting, and silencing of transposable elements. DNA methylation also has a complex regulatory relationship with gene expression. Canonically, methylation around the promoters of tumor-suppressor genes induces gene-silencing, thereby representing a hit in the two-hit hypothesis for the development of cancer. Unfortunately, profiling studies conducted to determine how aberrant methylation may contribute to cancer progression are confounded by heterogeneity in the original clinical sample. Thus, though studies in patients with Acute Myeloid Leukemia (AML), Diffuse Large B-cell Lymphoma (DLBCL), and Chronic Lymphocytic Leukemia (CLL) have found that variation in detected methylation values from patients at diagnosis correlates with prognosis following therapy, they do not address which subclonal methylation events contribute to cancer progression.
To address this concern, we developed a novel computational method to deconvolve the bisulfite sequencing data from a sample into its major methylation profiles and their respective prevalence in the sample. Our method, based on a modified Hidden Markov Model, effectively models the autocorrelations found in methylation data and outperforms existing algorithms. Our method was validated across a wide range of mixture simulations, where bisulfite sequencing reads from various different cell types were subsampled to form test samples that could be deconvolved. We were able to accurately (98%) distinguish distinct methylation patterns corresponding to the expected underlying subpopulations, such as for CD14 and CD22 in mixtures of germinal center B-cells and monocytes and for CD4 and CD8A in mixtures of CD4+ T-lymphocytes and CD8+ T-lymphocytes. These patterns also recapitulated differentially methylated regions (DMRs) identified by an independent DMR-caller.
Given that our method does not rely on cell-type specific parameters and is therefore robust to all samples, to further validate and demonstrate the applicability of our method, we conducted Agilent Methyl-Seq on 5 primary DLBCL samples procured by the Lymphoma Core at the Siteman Cancer Center. As a positive control, our method identified differential methylation profiles at loci expected to differ from underlying CD19+ and CD4+ cells, which comprise a large majority of each sample. Our method also identified distinct methylation profiles not found in reference profiles from normal cell-types, suggesting these methylation profiles may be specific to DLBCL. To further validate these findings, we used single-cell bisulfite-sequencing at ten loci to demonstrate that the methylation profiles predicted by our method from the original sample are found in individual cells. We found several methylation patterns that only existed in a subset of CD19+ cells, which may represent distinct epigenetic subclones of DLBCL.
Using our novel computational method, we next profiled the subclonal epigenetic architecture of publicly available (dbGaP) paired samples from patients with AML (n=137) at diagnosis and following therapy. We were able to not only identify subclonal methylation profiles that were specific to cancer but also find profiles at higher prevalence in patients at relapse compared to diagnosis. These methylation profiles, which were enriched for genes in cancer pathways as seen by Gene Set Enrichment Analysis, may confer fitness advantages for a cancer subclone to expand. We are currently conducting additional analyses to characterize the epigenetic regulatory circuits that contribute to our observed increase in subclonal fitness. In summary, we have developed a robust method to identify subclonal methylation changes that may contribute to cancer progression and prognosis, as seen in AML, and may lead to new avenues for improving treatment for patients with leukemia or lymphoma.
No relevant conflicts of interest to declare.
Author notes
Asterisk with author names denotes non-ASH members.